ABSTRACT
Public health researchers and practitioners commonly infer phylogenies from viral genome sequences to understand transmission dynamics and identify clusters of genetically-related samples. However, viruses that reassort or recombine violate phylogenetic assumptions and require more sophisticated methods. Even when phylogenies are appropriate, they can be unnecessary or difficult to interpret without specialty knowledge. For example, pairwise distances between sequences can be enough to identify clusters of related samples or assign new samples to existing phylogenetic clusters. In this work, we tested whether dimensionality reduction methods could capture known genetic groups within two human pathogenic viruses that cause substantial human morbidity and mortality and frequently reassort or recombine, respectively: seasonal influenza A/H3N2 and SARS-CoV-2. We applied principal component analysis (PCA), multidimensional scaling (MDS), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP) to sequences with well-defined phylogenetic clades and either reassortment (H3N2) or recombination (SARS-CoV-2). For each low-dimensional embedding of sequences, we calculated the correlation between pairwise genetic and Euclidean distances in the embedding and applied a hierarchical clustering method to identify clusters in the embedding. We measured the accuracy of clusters compared to previously defined phylogenetic clades, reassortment clusters, or recombinant lineages. We found that MDS maintained the strongest correlation between pairwise genetic and Euclidean distances between sequences and best captured the intermediate placement of recombinant lineages between parental lineages. Clusters from t-SNE most accurately recapitulated known phylogenetic clades and recombinant lineages. Both MDS and t-SNE accurately identified reassortment groups. We show that simple statistical methods without a biological model can accurately represent known genetic relationships for relevant human pathogenic viruses. Our open source implementation of these methods for analysis of viral genome sequences can be easily applied when phylogenetic methods are either unnecessary or inappropriate.
ABSTRACT
Investment in Africa over the past year with regards to SARS-CoV-2 genotyping has led to a massive increase in the number of sequences, exceeding 100,000 genomes generated to track the pandemic on the continent. Our results show an increase in the number of African countries able to sequence within their own borders, coupled with a decrease in sequencing turnaround time. Findings from this genomic surveillance underscores the heterogeneous nature of the pandemic but we observe repeated dissemination of SARS-CoV-2 variants within the continent. Sustained investment for genomic surveillance in Africa is needed as the virus continues to evolve, particularly in the low vaccination landscape. These investments are very crucial for preparedness and response for future pathogen outbreaks.
ABSTRACT
Despite the appearance of variant SARS-CoV-2 viruses with altered receptorbinding or antigenic phenotypes, traditional methods for detecting adaptive evolution from sequence data do not pick up strong signals of positive selection. Here, we present a new method for identifying adaptive evolution on short evolutionary time scales with densely-sampled populations. We apply this method to SARS-CoV-2 to perform a comprehensive analysis of adaptively-evolving regions of the genome. We find that spike S1 is a focal point of adaptive evolution, but also identify positively-selected mutations in other genes that are sculpting the evolutionary trajectory of SARS-CoV-2. Protein-coding mutations in S1 are temporally-clustered and, in 2021, the ratio of nonsynonymous to synonymous divergence in S1 is more than 4 times greater than in the equivalent influenza HA1 subunit.
ABSTRACT
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has caused substantially more infections, deaths, and economic disruptions than the 2002-2003 SARS-CoV. The key to understanding SARS-CoV-2's higher infectivity may lie in its host receptor recognition mechanism. This is because experiments show that the human ACE2 protein, which serves as the primary receptor for both CoVs, binds to CoV-2's spike protein 5-20 fold stronger than SARS-CoV's spike protein. The molecular basis for this difference in binding affinity, however, remains unexplained and, in fact, a comparison of X-ray structures leads to an opposite proposition. To gain insight, we use all-atom molecular dynamics simulations. Free energy calculations indicate that CoV-2's higher affinity is due primarily to differences in specific spike residues that are local to the spike-ACE2 interface, although there are allosteric effects in binding. Comparative analysis of equilibrium simulations reveals that while both CoV and CoV-2 spike-ACE2 complexes have similar interfacial topologies, CoV-2's spike protein engages in greater numbers, combinatorics and probabilities of hydrogen bonds and salt bridges with ACE2. We attribute CoV-2's higher affinity to these differences in polar contacts, and these findings also highlight the importance of thermal structural fluctuations in spike-ACE2 complexation. We anticipate that these findings will also inform the design of spike-ACE2 peptide blockers that, like in the cases of HIV and Influenza, can serve as antivirals.
Subject(s)
Coronavirus Infections , HIV Infections , Severe Acute Respiratory SyndromeABSTRACT
Antibodies targeting the SARS-CoV-2 spike receptor-binding domain (RBD) are being developed as therapeutics and make a major contribution to the neutralizing antibody response elicited by infection. Here, we describe a deep mutational scanning method to map how all amino-acid mutations in the RBD affect antibody binding, and apply this method to 10 human monoclonal antibodies. The escape mutations cluster on several surfaces of the RBD that broadly correspond to structurally defined antibody epitopes. However, even antibodies targeting the same RBD surface often have distinct escape mutations. The complete escape maps predict which mutations are selected during viral growth in the presence of single antibodies, and enable us to design escape-resistant antibody cocktails--including cocktails of antibodies that compete for binding to the same surface of the RBD but have different escape mutations. Therefore, complete escape-mutation maps enable rational design of antibody therapeutics and assessment of the antigenic consequences of viral evolution.
ABSTRACT
Following its emergence in Wuhan, China, in late November or early December 2019, the SARS-CoV-2 virus has rapidly spread throughout the world. On March 11, 2020, the World Health Organization declared Coronavirus Disease 2019 (COVID-19) a pandemic. Genome sequencing of SARS-CoV-2 strains allows for the reconstruction of transmission history connecting these infections. Here, we analyze 346 SARS-CoV-2 genomes from samples collected between 20 February and 15 March 2020 from infected patients in Washington State, USA. We found that the large majority of SARS-CoV-2 infections sampled during this time frame appeared to have derived from a single introduction event into the state in late January or early February 2020 and subsequent local spread, strongly suggesting cryptic spread of COVID-19 during the months of January and February 2020, before active community surveillance was implemented. We estimate a common ancestor of this outbreak clade as occurring between 18 January and 9 February 2020. From genomic data, we estimate an exponential doubling between 2.4 and 5.1 days. These results highlight the need for large-scale community surveillance for SARS-CoV-2 introductions and spread and the power of pathogen genomics to inform epidemiological understanding.